Implementation of LU, QR and RNG in Flagon

نویسنده

  • Liping Liu
چکیده

This paper introduces fast LU and QR implementations on GPU which are extended from LAPACK routines. Using fast matrix-matrix multiplication algorithm on GPU, right-looking technique to parallelize the computation, look-ahead technique to override the CPU and GPU computation together with optimal block size on GPU make this implementation outperform its counterparts. It gains around 2~8x speedup over LAPACK routines which are run on CPU as the number of rows in matrix varies from 1,000 to 11,000. The paper also provides detailed information on how to add customer functionality into Flagon which makes possible that developer can use CUDA code without knowing how to do CUDA coding. In this paper, a random generator on GPU is also imported into Flagon.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Numerical and Experimental Analysis of the Effect of Flow Discharge Ratio on Flow Separation at 45 Degree Open End Water Intake

Flow separation at water intake is the main cause of head loss and flow discharge reduction. As a result, study of shape and size of separation is very essential when designing an optimum water intake. Water intake is normally built with a 90 degree angle to the main channel flow direction. However, the flow structure in this type of water intake consists of large separation size along with ...

متن کامل

Numerical and Experimental Analysis of the Effect of Flow Discharge Ratio on Flow Separation at 45 Degree Open End Water Intake

Flow separation at water intake is the main cause of head loss and flow discharge reduction. &#10As a result, study of shape and size of separation is very essential when designing an optimum water intake. Water intake is normally built with a 90 degree angle to the main channel flow direction. However, the flow structure in this type of water intake consists of large separation size along wit...

متن کامل

Mixing LU and QR factorization algorithms to design high-performance dense linear algebra solvers

This paper introduces hybrid LU–QR algorithms for solving dense linear systems of the form Ax = b. Throughout a matrix factorization, these algorithms dynamically alternate LU with local pivoting and QR elimination steps based upon some robustness criterion. LU elimination steps can be very efficiently parallelized, and are twice as cheap in terms of floating-point operations, as QR steps. Howe...

متن کامل

Linear Algebra Research on the AP

This paper gives a report on various results of the Linear Algebra Project on the Fujitsu AP1000 in 1993. These include the general implementation of Distributed BLAS Level 3 subroutines (for the scattered storage scheme). The performance and user interface issues of the implementation are discussed. Implementations of Distributed BLAS-based LU Decomposition, Cholesky Factorization and Star Pro...

متن کامل

Implementation of LU Decomposition and QR Decomposition on Parallel Processing Systems

One of the earliest attempts to implement LU Decomposition with special purpose hardware was using systolic/wavefront arrays[2]. Different proposals for the processing elements(PEs) of systolic/wavefront arrays are provided[3][4][5]. These ideas were not implemented in circuit at that time. The performance of these architectures were not quantitatively evaluated either. In 1994, E. Casseau[6] i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009